Goals of Data Visualization
We aim to display the distribution of asthma cases from 2001-2021 using box or violin plots to guide the direction of our analysis to plot weather temperatures in these states with the most density of asthma cases. We intend to draw insight for the following questions regarding our hypothesis:
The merging process combines three datasets (asthma data, weather data, and geographic shapefiles) into a single comprehensive dataset (final_df) that can be used for mapping and statistical analysis. This merged dataset is the foundation for the subsequent visualizations and analyses, enabling the exploration of relationships between asthma prevalence, geographic regions, and environmental factors.
To see more on how we merged the datasets for our visualizations, please click “Show”
shape_files = usmap::us_map()
asthma_df = read_csv("data/asthma_data.csv")|>
mutate(year= year_name)
weather_df = read_csv("data/temp_data.csv")
asthma_weather =
asthma_df |>
left_join(weather_df, by = c("state", "year"))
final_df =
shape_files |>
mutate(state = abbr) |>
left_join(asthma_weather, by = "state") |>
drop_na()
Recent Adult Asthma Prevalance by state level is shown below:
ggplot=
final_df|>
filter(year==2021)|>
ggplot() +
geom_sf(aes(fill = prevalence_percent), color = "white") +
scale_fill_viridis_c(na.value = "grey90") +
theme_minimal() +
labs(
title = "Adult Asthma Prevalence by State (2021)",
fill = "Prevalence (%)"
) +
theme(
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank()
)
ggplot|>
ggplotly()
<<<<<<< HEAD
Regions in the North East and Mid East areas of the US had higher prevalence of adult asthma cases as shown by the lighter hue of green and yellow. This is surprising as these states are not southern states which experince warmer temperatures and seasons. We will further explore these trends next.
======= >>>>>>> 218fed0c4ceed882b111c2b9c035dfdfd11a36f3aggregated_data=
final_df|>
group_by(year)|>
summarise(avg_prevalence = mean(prevalence_percent, na.rm = TRUE))
aggregated_data |>
ggplot(aes(x = year, y = avg_prevalence)) +
geom_line() + # Line for the time series
geom_point(color = "red") + # Scatter points
labs(
title = "Adult Asthma Trend Across the US Over Time",
x = "Year",
y = "Average Asthma Prevalence (%)"
) +
theme_minimal()

The first figure shows that asthma prevalence in adults varies across the US. Moreover, asthma prevalence has been steadily increasing overtime. Can this increase and differences we see be associated to rising temperatures and temperature differences in different regions of the country?
The box plot method provided a clearer visual representation compared to the violin density plot, effectively illustrating the distribution of data and highlighting the concentration of values within each state.
ggplotly(
final_df|>
group_by(full)|>
ggplot(aes(x= reorder(full,prevalence_percent), y= prevalence_percent, fill = state))+
geom_boxplot()+
labs(title= "Distribution of Adult Asthma Across States" )+
xlab("State")+
ylab("Asthma Prevalance (%)")+
theme(
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5,),
legend.position = "none"
))
<<<<<<< HEAD
=======
>>>>>>> 218fed0c4ceed882b111c2b9c035dfdfd11a36f3
This plot shows the distribution of asthma across states across all years 2011-2021. States located further right on the plot are shown to have higher asthma prevalence by percent compared to other states. States located in the upper North East such as Maine, Rhode Island, Vermont, and New Hampshire had consistently higher asthma prevalence from 2011-2021 compared to other states. States with some of the lowest asthma prevalence (%) were Texas, South Dakota, Florida, Nebraska, and Minnesota.
ggplotly(
final_df|>
group_by(full)|>
ggplot(aes(x= reorder(full, avg_temp), y= avg_temp, fill = state))+
geom_boxplot()+
labs(title= "Distribution of Temperature Across States" )+
xlab("State")+
ylab("Temperature (C)")+
theme(
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
legend.position = "none"
))
<<<<<<< HEAD
=======
>>>>>>> 218fed0c4ceed882b111c2b9c035dfdfd11a36f3
When comparing states with higher asthma prevalence to their temperature distributions, we observe that states such as Maine, Rhode Island, Vermont, and New Hampshire exhibit lower median average temperatures and larger interquartile ranges. This variation reflects the significant seasonal temperature fluctuations in these regions. Notably, when comparing the upper and lower 25% of temperature distributions in these states, the temperatures in the upper quartile are more variable. This variability is important to highlight, as previous studies have suggested that extreme temperatures may increase the risk of asthma exacerbation. Overall, varying temperature conditions (both high and low extremes) might influence asthma prevalence across states.
The analysis reveals state-by-state trends in asthma prevalence over time. For states with significant positive slopes, public health initiatives should focus on identifying the underlying causes (such as environmental factors, healthcare access, or socio-economic disparities) driving the increase in asthma cases. Conversely, states with non-significant or negative trends may reflect the effectiveness of existing public health policies or different underlying factors contributing to asthma prevalence.
# Compute state trends
state_trends <- final_df |>
group_by(state) |>
summarise(model = list(lm(prevalence_percent ~ year))) |>
mutate(model_summary = map(model, broom::tidy)) |>
unnest(model_summary) |>
filter(term == "year") |>
select(state, slope = estimate, p_value = p.value) |>
st_drop_geometry()
# Render the interactive table with 4 decimal places
datatable(
state_trends,
options = list(
pageLength = 10, # Default number of rows displayed
lengthMenu = c(5, 10, 25, 50, 100), # Options for rows per page
scrollX = TRUE # Enable horizontal scrolling if needed
),
rownames = FALSE # Remove row names
) %>%
formatRound(columns = c("slope", "p_value"), digits = 4)
In summary, while some states show increasing asthma prevalence over time, a deeper investigation into local factors is needed to understand these trends. The interactive table provides a clear and concise way to explore these trends across states, helping to prioritize actions where they are most needed.
cor_results <- final_df |>
group_by(state) |>
summarise(
cor_test = list(cor.test(avg_temp, prevalence_percent)),
.groups = "drop"
) |>
rowwise() |>
mutate(
corr = cor_test$estimate, # Extract correlation coefficient
p_value = cor_test$p.value, # Extract p-value
geom_summary = st_as_text(st_centroid(geom)) # Convert geometry to text (centroid)
) |>
ungroup() |>
select(state, corr, p_value, geom_summary) # Exclude the `geom` column
# Render the interactive table
datatable(
cor_results,
options = list(
pageLength = 10,
lengthMenu = c(5, 10, 25, 50, 100),
scrollX = TRUE
),
rownames = FALSE
) %>%
formatRound(columns = c("corr", "p_value"), digits = 4)
Correlation of prevalence and average temperature is small across all states- could this mean we need to stratify by different factors (such as income level or race)?
line_plotly =
plot_ly(data = temp_yearly_df,
x = ~year_name,
y = ~avg_temp_yearly,
color = ~state,
type = 'scatter',
mode = 'lines' ) %>%
layout(title = "Seasonal Averages by State Over Time",
xaxis = list(title = "Year"),
yaxis = list(title = "Yearly Average"))
heat_plotly =
plot_ly(data = temp_yearly_df,
x = ~year_name,
y = ~state,
z = ~avg_temp_yearly,
type = "heatmap",
colorscale = "Viridis" ) %>%
layout(title = "Heatmap of Yearly Averages by State and Year",
xaxis = list(title = "Year"),
yaxis = list(title = "State"),
colorbar = list(title = "Yearly Avg"))
line_plotly
<<<<<<< HEAD
heat_plotly
=======
heat_plotly
>>>>>>> 218fed0c4ceed882b111c2b9c035dfdfd11a36f3
As seen from the graphs above, average temperatures varied across states at each year, as well as varied within each state over the years. How these variations affect asthma prevalence are explored on the maps and regression pages.
# bar plot of temperature and line plot of prevalence by each state
merged_df |>
mutate(state = reorder(state, avg_temp_yearly)) |>
group_by(state) |>
summarize(prevalence = mean(prevalence_percent),
temp = mean(avg_temp_yearly)) |>
ggplot(aes(x = state)) +
geom_bar(aes(y = temp), stat = "identity", fill = "skyblue") +
geom_line(aes(y = prevalence, group = 1)) +
geom_point(aes(y = prevalence), color = "red") +
scale_y_continuous(
name = "temperature",
sec.axis = sec_axis(~., name = "Prevalence (%)")
) +
labs(
title = "Temperature and Prevalence by State",
color = "legend"
) +
theme(
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

cor_results=
final_df |>
group_by(state) |>
summarise(
cor_test = list(cor.test(avg_temp, prevalence_percent)),
.groups = "drop")|>
rowwise()|>
mutate(
corr=cor_test[["estimate"]],
p_value= cor_test[["p.value"]])|>
ungroup()|>
select(-cor_test)|>
st_drop_geometry()|>
knitr::kable(digits=5, align = "c")
cor_results
| state | corr | p_value |
|---|---|---|
| AK | 0.05783 | 0.70924 |
| AL | 0.01064 | 0.94533 |
| AR | 0.01423 | 0.92693 |
| AZ | 0.06132 | 0.69254 |
| CA | -0.00830 | 0.95734 |
| CO | -0.02281 | 0.88315 |
| CT | 0.03076 | 0.84290 |
| DE | 0.02194 | 0.88759 |
| FL | -0.07174 | 0.65999 |
| GA | -0.00238 | 0.98777 |
| HI | -0.11770 | 0.44673 |
| IA | 0.02337 | 0.88032 |
| ID | -0.00193 | 0.99009 |
| IL | 0.00651 | 0.96656 |
| IN | -0.07432 | 0.63162 |
| KS | -0.02003 | 0.89729 |
| KY | 0.02177 | 0.88845 |
| LA | -0.00176 | 0.99095 |
| MA | -0.01240 | 0.93634 |
| MD | 0.03159 | 0.83867 |
| ME | 0.00038 | 0.99807 |
| MI | -0.03001 | 0.84668 |
| MN | -0.01584 | 0.91874 |
| MO | -0.01098 | 0.94361 |
| MS | 0.04062 | 0.79347 |
| MT | -0.06500 | 0.67509 |
| NC | 0.01004 | 0.94843 |
| ND | 0.02529 | 0.87057 |
| NE | -0.00178 | 0.99084 |
| NH | 0.03758 | 0.80865 |
| NJ | 0.02442 | 0.88108 |
| NM | 0.05329 | 0.75411 |
| NV | -0.00966 | 0.95036 |
| NY | -0.04588 | 0.76745 |
| OH | -0.01605 | 0.91765 |
| OK | 0.01914 | 0.90185 |
| OR | 0.00184 | 0.99056 |
| PA | 0.02032 | 0.89585 |
| RI | 0.02585 | 0.86772 |
| SC | 0.02651 | 0.86438 |
| SD | -0.02918 | 0.85084 |
| TN | 0.03026 | 0.84540 |
| TX | 0.00805 | 0.95866 |
| UT | -0.05871 | 0.70503 |
| VA | 0.00848 | 0.95642 |
| VT | -0.02405 | 0.87683 |
| WA | -0.03590 | 0.81706 |
| WI | -0.05946 | 0.70143 |
| WV | 0.05704 | 0.71304 |
| WY | 0.01313 | 0.93258 |
Correlation of prevalence and average temperature is small across all states- could this mean we need to stratify by different factors (such as income level or race)?
state_trends=
final_df |>
group_by(state) |>
summarise(
model = list(lm(prevalence_percent ~ year)))
state_trends= state_trends |>
mutate(
model_summary = map(model, broom::tidy)) |>
unnest(model_summary)|>
filter(term == "year")|>
select(
state,
slope = estimate,
intercept = NULL,
p_value = p.value)|>
st_drop_geometry()|>
knitr::kable(digits=5)
state_trends
| state | slope | p_value |
|---|---|---|
| AK | 0.06182 | 0.00868 |
| AL | 0.16455 | 0.00001 |
| AR | 0.02909 | 0.25949 |
| AZ | 0.06091 | 0.00049 |
| CA | 0.02636 | 0.29868 |
| CO | 0.16364 | 0.00000 |
| CT | 0.09364 | 0.00000 |
| DE | 0.04727 | 0.16551 |
| FL | -0.05152 | 0.09681 |
| GA | 0.00455 | 0.86590 |
| HI | -0.06182 | 0.04633 |
| IA | 0.08273 | 0.00179 |
| ID | 0.10000 | 0.00000 |
| IL | 0.03455 | 0.07010 |
| IN | 0.02182 | 0.28358 |
| KS | 0.18364 | 0.00000 |
| KY | 0.05545 | 0.17836 |
| LA | 0.19909 | 0.00000 |
| MA | -0.00909 | 0.76559 |
| MD | 0.04636 | 0.00314 |
| ME | 0.01455 | 0.59673 |
| MI | 0.09091 | 0.00002 |
| MN | 0.11364 | 0.00000 |
| MO | -0.05727 | 0.01216 |
| MS | 0.22636 | 0.00000 |
| MT | 0.11455 | 0.00003 |
| NC | 0.03455 | 0.19492 |
| ND | 0.03182 | 0.09917 |
| NE | 0.10545 | 0.00000 |
| NH | 0.14909 | 0.00023 |
| NJ | -0.00400 | 0.87883 |
| NM | 0.04292 | 0.17131 |
| NV | 0.19000 | 0.00000 |
| NY | -0.01636 | 0.41568 |
| OH | 0.01727 | 0.47857 |
| OK | 0.12545 | 0.00000 |
| OR | 0.05091 | 0.01235 |
| PA | 0.10091 | 0.00000 |
| RI | 0.09364 | 0.00184 |
| SC | 0.12182 | 0.00000 |
| SD | 0.09909 | 0.00123 |
| TN | 0.30727 | 0.00000 |
| TX | 0.07182 | 0.00022 |
| UT | 0.14455 | 0.00000 |
| VA | 0.05636 | 0.00440 |
| VT | 0.05636 | 0.01882 |
| WA | 0.04818 | 0.00625 |
| WI | 0.10182 | 0.00312 |
| WV | 0.32091 | 0.00000 |
| WY | 0.09182 | 0.00036 |